-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] Streaming Local Writes for Native Executor #2871
Conversation
CodSpeed Performance ReportMerging #2871 will not alter performanceComparing Summary
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2871 +/- ##
==========================================
- Coverage 76.34% 75.60% -0.74%
==========================================
Files 597 601 +4
Lines 71388 73589 +2201
==========================================
+ Hits 54504 55640 +1136
- Misses 16884 17949 +1065
Flags with carried forward coverage won't be shown. Click here to find out more.
|
68b18c1
to
8d80919
Compare
pub trait FileWriter: Send + Sync { | ||
fn write(&self, data: &Arc<MicroPartition>) -> DaftResult<()>; | ||
fn close(&self) -> DaftResult<Option<String>>; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idk where to put this trait, ideally somewhere that already has Daft-Micropartition as a dependency, so not Daft-IO.
Simple multithreaded vs single threaded writing parquet using pyarrow.
Looks like PyArrow does release the GIL, so parallelism is possible. cc @samster25 |
Implements physical file writes, supports unpartitioned + partitioned writes, configurable row group / file sizes.
Unpartitioned writes:
Partitioned writes
SizedDataWriter
.SizedDataWriter
is responsible for closing and opening new files when target file size is reached.Notes:
max_open_files
parameter, which we can support via LRU caching of the writers.Memray stats (read -> write lineitem parquet)